Skip to content

feat(installer): zeta-hardware-detect.ts — TS module for GPU+storage+CPU+memory classification (24 unit tests; Rule 0 discipline)#5642

Merged
AceHack merged 1 commit into
mainfrom
feat/zeta-hardware-detect-ts-rule-0-discipline-detects-gpu-storage-cpu-classes-2026-05-27
May 27, 2026
Merged

feat(installer): zeta-hardware-detect.ts — TS module for GPU+storage+CPU+memory classification (24 unit tests; Rule 0 discipline)#5642
AceHack merged 1 commit into
mainfrom
feat/zeta-hardware-detect-ts-rule-0-discipline-detects-gpu-storage-cpu-classes-2026-05-27

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 27, 2026

Extracts inline lspci heuristic from zeta-install.sh (PR #5635) into testable TS module per Rule 0 TS-over-bash. Extends scope: now detects storage shape (NVMe/SSD/HDD + count), CPU class (nproc + vendor_id), memory (GB). --suggested-host flag outputs one of control-plane / worker-gpu / worker-template for bash $(...) capture. 24 unit tests; pure-logic exports (no I/O during tests). Does NOT yet modify zeta-install.sh (stays out of way of in-flight #5638 + #5640). Follow-up commit will replace inline lspci block with bun-invoke.

🤖 Generated with Claude Code

…memory classification; 24 unit tests; Rule 0 TS-over-bash discipline

Extracts inline lspci heuristic from zeta-install.sh (PR #5635) into
testable TS module. Extends scope: detects storage (NVMe/SSD/HDD +
count), CPU (nproc + vendor_id), memory (GB). --suggested-host flag
outputs one of control-plane / worker-gpu / worker-template for bash
$(...) capture. 24 unit tests; pure-logic exports (no I/O during
tests). Does NOT yet modify zeta-install.sh (stays out of way of
in-flight #5638 + #5640). Follow-up commit will replace inline lspci
block with bun-invoke.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 27, 2026 21:05
@AceHack AceHack enabled auto-merge (squash) May 27, 2026 21:05
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack merged commit 0e5fd04 into main May 27, 2026
31 of 33 checks passed
@AceHack AceHack deleted the feat/zeta-hardware-detect-ts-rule-0-discipline-detects-gpu-storage-cpu-classes-2026-05-27 branch May 27, 2026 21:08
AceHack added a commit that referenced this pull request May 27, 2026
…xedAccess errors (#5642 lint fwd) (#5645)

PR #5642 landed the TS module + 24 unit tests. The `lint (tsc tools)`
non-required check failed with 2 strict-mode errors after merge:

- classifyStorage line 134: `name.startsWith("nvme")` — destructure
  `const [name, rotaStr] = cols;` types as `string | undefined`
  under noUncheckedIndexedAccess even after `cols.length < 3` check
- parseMemoryGb line 159: `parseInt(m[1], 10)` — regex capture-group
  typed as `string | undefined` under same flag

Both pass the LOGIC (length-check / null-check) but tsc strict can't
narrow through array-destructure or chained optional-access without
explicit-narrow.

Fix: explicit-narrow before use in both places. Replaces destructure
with indexed access + explicit `=== undefined` guard; replaces inline
`m[1]` with intermediate `kbStr` constant + guard.

No behavior change (tests still 24/24 pass; classify/parse output
identical). Pure type-narrowing for tsc strict compliance.

Future TS additions to tools/installer/ inherit the discipline:
explicit-narrow before use; don't rely on length-check to satisfy
strict noUncheckedIndexedAccess.

Composes with #5642 (the file the lint warning was filed against).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 27, 2026
…dware-detect.ts (extends GPU-only inline lspci → GPU+storage+CPU classification with fallback) (#5646)

PR #5642 shipped the TS module + 24 unit tests for hardware
classification (GPU + storage shape + CPU vendor + memory + suggested
host). This commit wires the install.sh menu to call it for
suggested-host, replacing the inline lspci-only heuristic.

The TS module's logic is strictly richer than the inline replacement:

| Heuristic              | Old (inline lspci)        | New (TS module)                        |
|------------------------|---------------------------|----------------------------------------|
| GPU detected           | → worker-gpu              | → worker-gpu                           |
| ≥4 disks + ≥64GB RAM   | not detected              | → worker-template (storage-heavy)      |
| ≥16 cores + ≥32GB RAM  | not detected              | → worker-template (CPU-heavy)          |
| Default                | → control-plane           | → control-plane                        |

Composition path:

1. SCRIPT_DIR resolves the script's own directory
2. HWDETECT_REPO_ROOT = $SCRIPT_DIR/../.. (two-dirs-up from
   full-ai-cluster/usb-nixos-installer/ → repo root)
3. HWDETECT_TS = $HWDETECT_REPO_ROOT/tools/installer/zeta-hardware-detect.ts
4. If `bun` on PATH AND TS file exists → run `bun ... --suggested-host`
5. If unavailable OR returns empty → fall back to original inline
   lspci-only heuristic (degraded but functional)
6. Menu text + default-choice logic unchanged; only the SUGGESTED_HOST
   computation source differs

The fallback ensures the menu still works in degraded environments
(no bun on PATH, missing TS file, TS module crash) — operator can
still pick a host attribute via the numbered menu. Substrate-honest
disclosure: the fallback's GPU-only heuristic IS less precise than
the TS module's GPU+storage+CPU classification, so falling back loses
the storage-heavy and CPU-heavy detection — but doesn't break flow.

Menu output now distinguishes the three suggestion classes:

- worker-gpu  → "GPU detected — likely worker node"
- worker-template → "storage-heavy OR CPU-heavy node — customize per
                    PROVISIONING.md cookie-cutter workflow"
- control-plane → "no GPU + not storage/CPU-heavy — defaulting to
                   control-plane"

Validation:
- bash -n syntax check passed
- Docker harness (bun tools/ci/docker-nixos-install-sh-test.ts) passed
  in 15s

Composes with:
- #5642 (TS module + tests landing)
- #5635 (cluster-type menu extension that established the
  numbered-menu structure this commit upgrades)
- B-0857.3 (next: factor zeta-install.sh body into callable
  nixos-install-from-usb.sh)

Closes operator's "getting the menu fixed so it has all the cluster
types we talked about — storage cpu gpu etc... and letting you select
multiple or detecting based on hardware etc..." at the
detection-based-on-hardware scope. Multi-select-cluster-types remains
a future B-0792-extension (requires flake-shape refactor to support
role-tagging per node).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 27, 2026
…inding on #5640 — restore service never fired) (#5644)

* fix(b-0852.4): correct cred-blob default path /esp → /boot to match installed-system ESP mount (Copilot finding on #5640)

PR #5640 shipped credsRestore.enable=true with blobPath defaulting to
`/esp/zeta-creds.enc`. Copilot review flagged the substrate-honest bug:

- At INSTALL-TIME (live USB), zeta-install.sh's Step 6.95-picker writes
  the blob to `/esp/zeta-creds.enc` because the live installer mounts
  the target ESP at `/esp`
- POST-REBOOT, disko (`disko-shapes/2nvme.nix`) mounts the SAME ESP
  partition at `/boot` per `mountpoint = "/boot"`
- The blob is the same physical file on the same ESP partition, but
  the mount path differs by context

The restore service runs POST-REBOOT, where the file is at
`/boot/zeta-creds.enc` — NOT `/esp/zeta-creds.enc`. So:

- ConditionPathExists = "/esp/zeta-creds.enc" always evaluates FALSE
  on the installed system (`/esp` doesn't exist post-reboot)
- systemd silently skips the unit (condition unmet)
- restore-from-cred-blob NEVER FIRES on any installed node
- creds are never restored at boot
- operator has to manually re-enter every credential each reboot —
  which is exactly the pain point the whole B-0852 cascade was
  designed to solve

This commit changes the default to `/boot/zeta-creds.enc` so the
service can actually find the blob it's supposed to decrypt. Also
expands the option description to explain the install-vs-installed
mount-path distinction so future maintainers don't reintroduce the
same confusion.

No changes to zeta-install.sh: the install-time write to
`/esp/zeta-creds.enc` is correct for the install-time context;
disko's later remount-as-/boot is what makes the file accessible
at the new path.

Validation:
- `nix-instantiate --parse zeta-creds-restore.nix` parses clean
  (no syntax change; only literal value + description text)
- Substrate-honest: this is a single-line semantic fix; the
  multi-line description expansion is wake-time substrate for the
  next maintainer who edits this module

Composes with #5640 (the row that surfaced the issue), #5643
(passphrase-env supersede), and the B-0852 cred-persistence cascade
(#5635 + #5637 + #5638 + #5639 + #5640 + #5641 + #5642).

Addresses CRITICAL Copilot finding on #5640.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fixup(b-0852.4): producer-side /esp → /mnt/boot + clean option doc (Copilot threads on #5644)

P1 — producer-side path mismatch ALSO needs fixing
  Prior commit fixed CONSUMER (restore service) but PRODUCER
  (Step 6.95-picker) was writing to /esp/zeta-creds.enc — which
  doesn't correspond to any mount. Target ESP is mounted at
  /mnt/boot during install (zeta-install.sh:226). Blob was
  landing on live USB rootfs, not target ESP. Reboot lost it.

  Fix: picker --output /esp/zeta-creds.enc → /mnt/boot/zeta-creds.enc.
  Producer now writes to target ESP mount, disko remounts as /boot
  post-reboot. Same physical file at two mount paths bridges the
  install-vs-installed boundary.

P2 — option doc style: strip PR-review history attribution
  Prior commit included "caught by Copilot review on PR #5640" in
  option doc. Repo convention: code/current-state docs use
  role-neutral present-tense contract text; PR-review history lives
  in commit messages + history surfaces.

  Fix: rewrite doc as present-tense contract for the option (what
  it configures + install-vs-installed mount convention for
  operators using non-default ESP layouts).

Validation: bash -n OK.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
@AceHack AceHack review requested due to automatic review settings May 27, 2026 21:26
AceHack pushed a commit that referenced this pull request May 27, 2026
…D018 fix (Copilot 6 threads on #5648)

Comprehensive accuracy rewrite addressing all 6 Copilot findings:

1. "no more re-entering" overclaim — passphraseMode=interactive
   DOES prompt every boot via systemd-ask-password. Reframed
   accurately: N per-tool login flows → ONE cred-blob passphrase.
   The improvement is atomicity, not zero typing.

2. Install log lines mismatch — restored to match actual zeta-install.sh
   output (Step 6.56 + Step 6.95-picker actual strings).

3. /boot path correctness — preserved (#5644 already fixed
   producer/consumer alignment to /mnt/boot ↔ /boot).

4. Manifest coverage — included gemini + codex paths
   (~/.gemini/oauth_creds.json, ~/.codex/auth.json) plus the
   full default-manifest table.

5. Second-reboot expectation — corrected: interactive mode prompts
   every boot by design. Operator who wants no-prompt-at-boot can
   switch to passphraseMode="file" (with security tradeoff named).

6. Filename reference — zeta-creds-cli.ts → zeta-creds-manifest.ts
   (actual canonical location of defaultManifest).

Also fixes MD018 lint failure: line "#5639 + #5640 + #5643 + #5644 +"
was being parsed as an ATX heading because # was at column 1. Replaced
the line-wrapped PR-number prose with the default-manifest table
(more useful + no MD018 trigger).

Composes with:
- B-0852 cred-persistence cascade (PRs that ACTUALLY ship: #5635,
  #5637, #5639, #5640, #5641, #5642, #5644, #5645, #5646, #5648,
  #5649, #5650; #5638 + #5643 were superseded → closed without merge)
- common.nix passphraseMode=interactive default (PR #5640)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 27, 2026
…o-end verification checklist for operator) (#5648)

* docs(provisioning): add cred-restore smoke-test section — first-boot + post-reboot + second-reboot verification + troubleshooting table (B-0852 end-to-end)

The B-0852 cred-persistence cascade (PRs #5635 + #5637 + #5638 +
#5639 + #5640 + #5643 + #5644 + #5646) closes the operator's
'don't re-enter creds over and over' pain point. This docs addition
gives operators a concrete checklist to verify the full path works
after a fresh USB install:

- First-boot verification: what install log lines to look for
- Post-reboot verification: systemctl + ls + auth-status commands
- Second-reboot verification: confirm no re-entry needed
- Troubleshooting table: 4 common symptoms with likely causes

Closes the gap between 'cascade is shipped' and 'operator can
confirm cascade works on their hardware'. The operator no longer
has to figure out which systemd unit to query or which paths to
check — the checklist names them.

Composes with:
- PROVISIONING.md (existing operator-facing install doc)
- B-0852 cred-persistence substrate
- The audit-extension PR (separate; catches drift at CI time)

Substrate-honest scope: this is operator docs, not a TS tool. A
follow-on TS smoke-test runner (run on the installed system to
auto-verify the checklist) is a candidate for follow-up work but
out of scope for this commit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fixup(docs): rewrite cred-restore smoke-test section for accuracy + MD018 fix (Copilot 6 threads on #5648)

Comprehensive accuracy rewrite addressing all 6 Copilot findings:

1. "no more re-entering" overclaim — passphraseMode=interactive
   DOES prompt every boot via systemd-ask-password. Reframed
   accurately: N per-tool login flows → ONE cred-blob passphrase.
   The improvement is atomicity, not zero typing.

2. Install log lines mismatch — restored to match actual zeta-install.sh
   output (Step 6.56 + Step 6.95-picker actual strings).

3. /boot path correctness — preserved (#5644 already fixed
   producer/consumer alignment to /mnt/boot ↔ /boot).

4. Manifest coverage — included gemini + codex paths
   (~/.gemini/oauth_creds.json, ~/.codex/auth.json) plus the
   full default-manifest table.

5. Second-reboot expectation — corrected: interactive mode prompts
   every boot by design. Operator who wants no-prompt-at-boot can
   switch to passphraseMode="file" (with security tradeoff named).

6. Filename reference — zeta-creds-cli.ts → zeta-creds-manifest.ts
   (actual canonical location of defaultManifest).

Also fixes MD018 lint failure: line "#5639 + #5640 + #5643 + #5644 +"
was being parsed as an ATX heading because # was at column 1. Replaced
the line-wrapped PR-number prose with the default-manifest table
(more useful + no MD018 trigger).

Composes with:
- B-0852 cred-persistence cascade (PRs that ACTUALLY ship: #5635,
  #5637, #5639, #5640, #5641, #5642, #5644, #5645, #5646, #5648,
  #5649, #5650; #5638 + #5643 were superseded → closed without merge)
- common.nix passphraseMode=interactive default (PR #5640)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 27, 2026
…re the just-written blob at install time (operator catches bad blob BEFORE reboot, not at first boot) (#5655)

Adds opt-in --verify flag to zeta-creds-picker.ts. When set, after
zeta-creds-persist succeeds, the picker spawns zeta-creds-restore.ts
with --dry-run + the same passphrase source + a tmpdir as
--target-root. If restore-dry-run exits 0, the blob is confirmed
cryptographically valid + manifest-parseable. If non-zero, the
operator sees an actionable error at install time + can re-run the
picker to retry.

Operator-experience improvement: without --verify, a corrupt blob
(wrong passphrase captured, disk write error, persist bug) only
surfaces at first reboot when zeta-creds-restore.service fails its
ConditionPathExists or scrypt-decrypt step. At that point the
operator must reboot back into the live USB + re-run the install.
With --verify, the same failure surfaces SECONDS after persist,
inside the running install flow, with the live USB still mounted.

New exit code 5 for verify-failed (distinct from persist-failed=4).

API addition:
- PickerArgs gains `verify: boolean` (default false; opt-in)
- New export buildVerifyArgs(parsed, tmpTargetRoot) — pure
  composer of the restore-CLI argv list; testable in isolation

Tests added (3 new + 2 parseArgs-extension):
- --verify flag default false
- --verify flag parsed when passed
- buildVerifyArgs composes restore-CLI args with --dry-run + tmpdir
- buildVerifyArgs propagates --passphrase-file when picker used file
- buildVerifyArgs propagates --persona when set

21 pass / 0 fail (was 16; +5).

Substrate-honest scope: opt-in only. Future PR can flip default-on
after operator empirical testing confirms verify doesn't introduce
new failure modes (e.g., tmpdir permission, restore-CLI changes).
zeta-install.sh Step 6.95-picker currently does NOT pass --verify;
that flip can land in a follow-up after operator tests.

Composes with:
- B-0852 cred-persistence cascade (#5635 + #5637 + #5639 + #5640 +
  #5642 + #5644 + #5645 + #5646 + #5648 + #5649 + #5650)
- tools/installer/zeta-creds-restore.ts (existing --dry-run mode)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant